29. Remove, Repeat
Remove, Repeat
Question:
This word seems like an outlier in a certain sense, so let’s remove it and refit. Go back to text_learning/vectorize_text.py, and remove this word from the emails using the same method you used to remove “sara”, “chris”, etc. Rerun vectorize_text.py, and once that finishes, rerun find_signature.py. Any other outliers pop up? What word is it? Seem like a signature-type word? (Define an outlier as a feature with importance >0.2, as before).
Start Quiz:
INSTRUCTOR NOTE:
Special Note: Depending on when you downloaded the code provided for find_signature.py, you may need to change the code in lines 9-10 to be
words_file = "../text_learning/your_word_data.pkl"
authors_file = "../text_learning/your_email_authors.pkl"
so that the files created from running vectorize_text.py are reflected properly.